Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs
نویسندگان
چکیده
This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon procedure, for solving problems of stochastic optimal control under the long-run average criterion, in Markov Decision Processes with finite state and action spaces. We review conditions of the literature which imply the geometric convergence of Value Iteration to the optimal value. Aperiodicity is an essential prerequisite for convergence. We prove that the convergence of Value Iteration generally implies that of Rolling Horizon. We also present a modified Rolling Horizon procedure that can be applied to models without analyzing periodicity, and discuss the impact of this transformation on convergence. We illustrate with numerous examples the different convergence results. Key-words: Markov decision problems, Value iteration, Heuristic methods, Rolling horizon. ∗ CONICET UNR, Argentina † CONICET UNR, Argentina ‡ INRIA and LIRMM, CNRS/Université Montpellier 2, 161 Rue Ada, F-34392 Montpellier, [email protected]. in ria -0 06 17 27 1, v er si on 1 26 A ug 2 01 1 Une revue illustrée des conditions de convergence pour l’algorithme d’itération de valeur et la procédure de l’horizon roulant, pour les processus de décision Markoviens en coût moyen Résumé : Nous nous intéressons aux relations entre l’algorithme d’itération de valeurs et la procédure de l’horizon roulant, pour résoudre les problèmes de contrôle optimal stochastique Markovien sous le critre du coût moyen, dans le cas d’espaces d’états et d’actions finis. Nous passons en revue des conditions issues de la littérature qui impliquent la convergence géométrique de l’itération de valeurs vers la valeur optimale. L’apériodicité du modèle est un pré-requis essentiel. Nous montrons que la convergence de l’itération de valeurs implique de façon générale celle de l’horizon roulant. Nous présentons également une procédure modifiée d’horizon roulant qui peut être appliquée sans avoir besoin d’analyser l’apériodicité, et nous étudions l’impact de cette transformation sur la convergence. Nous illustrons les différents résultats avec de nombreux exemples. Mots-clés : Processus de décision Markovien, itération de valeurs, méthodes heuristiques, horizon roulant. in ria -0 06 17 27 1, v er si on 1 26 A ug 2 01 1 Convergence conditions for VI and RH in MDPs 3
منابع مشابه
Uniform Convergence of Value Iteration Policies for Discounted Markov Decision Processes
This paper deals with infinite horizon Markov Decision Processes (MDPs) on Borel spaces. The objective function considered, induced by a nonnegative and (possibly) unbounded cost, is the expected total discounted cost. For each of theMDPs analized, the existence of a unique optimal policy is assumed. Conditions that guarantee both pointwise and uniform convergence on compact sets of the minimiz...
متن کاملOn the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
This paper studies convergence properties of optimal values and actions for discounted and averagecost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملAsymptotic properties of constrained Markov Decision Processes
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of nite horizon MDPs to the innnite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with innnite state space,...
متن کاملProof of Convergence for Evolutionary Policy Iteration under a Sampling Regime
This article extends the evolutionary policy selection algorithm of Chang et al. (2005, 2007), which was designed for use in infinite horizon Markov decision processes (MDPs) with a large action space to a discrete stochastic optimization problem, in an algorithm called Evolutionary Policy Iteration-Monte Carlo (EPI-MC). EPI-MC allows EPI to be used in a setting with a finite decision (action) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Annals OR
دوره 199 شماره
صفحات -
تاریخ انتشار 2012